Search CORE

Proximity graphs are used in several areas in which a neighborliness relationship for input data sets is a useful tool in their analysis, and have also received substantial attention from the graph drawing community, as they are a natural way of implicitly representing graphs. However, as a tool for graph representation, proximity graphs have some limitations that may be overcome with suitable generalizations. We introduce a generalization, witness graphs, that encompasses both the goal of more power and flexibility for graph drawing issues and a wider spectrum for neighborhood analysis. We study in detail two concrete examples, both related to Delaunay graphs, and consider as well some problems on stabbing geometric objects and point set discrimination, that can be naturally described in terms of witness graphs.Comment: 27 pages. JCCGG 200

arXiv.org e-Print Archive

Elsevier - Publisher Connector

Witness Gabriel Graphs

Author: Aronov Boris
Dulieu Muriel
Hurtado Ferran
Publication venue
Publication date: 01/01/2009
Field of study

We consider a generalization of the Gabriel graph, the witness Gabriel graph. Given a set of vertices P and a set of witnesses W in the plane, there is an edge ab between two points of P in the witness Gabriel graph GG-(P,W) if and only if the closed disk with diameter ab does not contain any witness point (besides possibly a and/or b). We study several properties of the witness Gabriel graph, both as a proximity graph and as a new tool in graph drawing.Comment: 23 pages. EuroCG 200

arXiv.org e-Print Archive

CiteSeerX

UPCommons. Portal del coneixement obert de la UPC

Proximity Drawings of High-Degree Trees

Author: Barát J.
Carmi P.
DAVID R. WOOD
Eppstein D.
FERRAN HURTADO
GIUSEPPE LIOTTA
Publication venue: 'World Scientific Pub Co Pte Lt'
Publication date: 18/08/2010
Field of study

A drawing of a given (abstract) tree that is a minimum spanning tree of the vertex set is considered aesthetically pleasing. However, such a drawing can only exist if the tree has maximum degree at most 6. What can be said for trees of higher degree? We approach this question by supposing that a partition or covering of the tree by subtrees of bounded degree is given. Then we show that if the partition or covering satisfies some natural properties, then there is a drawing of the entire tree such that each of the given subtrees is drawn as a minimum spanning tree of its vertex set

arXiv.org e-Print Archive

CiteSeerX

Crossref

On k-Convex Polygons

Author: Aichholzer Oswin
Aurenhammer Franz
Demaine Erik D.
Hurtado Ferran
Ramos Pedro
Urrutia Jorge
Publication venue
Publication date: 21/07/2010
Field of study

We introduce a notion of

k

-convexity and explore polygons in the plane that have this property. Polygons which are \mbox{

k

-convex} can be triangulated with fast yet simple algorithms. However, recognizing them in general is a 3SUM-hard problem. We give a characterization of \mbox{

2

-convex} polygons, a particularly interesting class, and show how to recognize them in \mbox{

O(n \log n)

} time. A description of their shape is given as well, which leads to Erd\H{o}s-Szekeres type results regarding subconfigurations of their vertex sets. Finally, we introduce the concept of generalized geometric permutations, and show that their number can be exponential in the number of \mbox{

2

-convex} objects considered.Comment: 23 pages, 19 figure

arXiv.org e-Print Archive

DSpace@MIT

An O(n log n)-Time Algorithm for the Restricted Scaffold Assignment

Author: Colannino Justin
Damian Mirela
Hurtado Ferran
Iacono John
Meijer Henk
Ramaswami Suneeta
Toussaint Godfried
Publication venue
Publication date: 01/01/2005
Field of study

The assignment problem takes as input two finite point sets S and T and establishes a correspondence between points in S and points in T, such that each point in S maps to exactly one point in T, and each point in T maps to at least one point in S. In this paper we show that this problem has an O(n log n)-time solution, provided that the points in S and T are restricted to lie on a line (linear time, if S and T are presorted).Comment: 13 pages, 8 figure

arXiv.org e-Print Archive

DI-fusion

Villanova University. Falvey Memorial Library: Villanova Digital Library

Spanish sentiment analysis in Twitter at the TASS workshop

Author: Hurtado Oliver Lluis Felip
Pla Santamaría Ferran
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

[EN] This paper describes a support vector machine-based approach to different tasks related to sentiment analysis in Twitter for Spanish. We focus on parameter optimization of the models and the combination of several models by means of voting techniques. We evaluate the proposed approach in all the tasks that were defined in the five editions of the TASS workshop, between 2012 and 2016. TASS has become a framework for sentiment analysis tasks that are focused on the Spanish language. We describe our participation in this competition and the results achieved, and then we provide an analysis of and comparison with the best approaches of the teams who participated in all the tasks defined in the TASS workshops. To our knowledge, our results exceed those published to date in the sentiment analysis tasks of the TASS workshops.This work has been partially funded by the Spanish MINECO and FEDER founds under project ASLP-MULAN: Audio, Speech and Language Processing for Multimedia Analytics, TIN2014-54288-C4-3-R.Pla Santamaría, F.; Hurtado Oliver, LF. (2018). Spanish sentiment analysis in Twitter at the TASS workshop. Language Resources and Evaluation. 52(2):645-672. https://doi.org/10.1007/s10579-017-9394-7S645672522Álvarez-López, T., Juncal-Martínez, J., Fernández-Gavilanes, M., Costa-Montenegro, E., González-Castaño, F.J., Cerezo-Costas, H. , & Celix-Salgado, D. (2015). GTI-gradiant at TASS 2015: A hybrid approach for sentiment analysis in Twitter. In Proceedings of TASS 2015: Workshop on sentiment analysis at SEPLN co-located with 31st SEPLN conference (SEPLN 2015) (pp. 35–40), Alicante, Spain, September 15, 2015.Álvarez-López, T., Fernández-Gavilanes, M., García-Méndez, S., Juncal-Martínez, J., & González-Castaño, F.J. (2016). GTI at TASS 2016: Supervised approach for aspect based sentiment analysis in Twitter. In Proceedings of TASS 2016: Workshop on sentiment analysis at SEPLN co-located with 32nd SEPLN conference (SEPLN 2016) (pp. 53–57), Salamanca, Spain, September 13th, 2016.Araque, O., Corcuera, I., Román, C., Iglesias, C. A., & Sánchez-Rada, J. F. (2015). Aspect based sentiment analysis of Spanish tweets. In Proceedings of TASS 2015: Workshop on sentiment analysis at SEPLN co-located with 31st SEPLN conference (SEPLN 2015) (pp. 29–34), Alicante, Spain, September 15, 2015.Balahur, A., & Perea-Ortega, J. M. (2013). Experiments using varying sizes and machine translated data for sentiment analysis in Twitter. In Proceedings of the TASS workshop at SEPLN 2013, IV Congreso Español de Informática.Barbosa, L., & Feng, J. (2010). Robust sentiment detection on Twitter from biased and noisy data. In Proceedings of the 23rd international conference on computational linguistics: posters, association for computational linguistics (pp. 36–44).Batista, F., & Ribeiro, R. (2012). The L2F Strategy for Sentiment Analysis and Topic Classification. Technical report, http://www.sepln.org/workshops/tass/2012/participation.php .Casasola Murillo, E., & Marín Raventós, G. (2016). Evaluación de Modelos de Representación del Texto con Vectores de Dimensiónn Reducida para Análisis de Sentimiento. In Proceedings of TASS 2016: Workshop on sentiment analysis at SEPLN co-located with 32nd SEPLN conference (SEPLN 2016) (pp. 23–28), Salamanca, Spain, September 13th, 2016.Castellano, A., Cigarrán, J. & García-Serrano, A. (2012). UNED @ TASS: Using IR techniques for topic-based sentiment analysis through divergence models. Technical report, http://www.sepln.org/workshops/tass/2012/participation.php .Castellanos-González, A., Cigarrán-Recuero, J. & García-Serrano, A. (2013). UNED LSI @ TASS 2013: Considerations about textual representation for IR based tweet classification. In: Proceedings of the TASS workshop at SEPLN 2013, IV Congreso Español de Informática.Cerón-Guzmán, J. A. (2016). JACERONG at TASS 2016: An ensemble classifier for sentiment analysis of Spanish tweets at global level. In: Proceedings of TASS 2016: Workshop on sentiment analysis at SEPLN co-located with 32nd SEPLN conference (SEPLN 2016) (pp. 35–39), Salamanca, Spain, September 13th, 2016.del-Hoyo-Alonso, R., Hupont, I., & Lacueva, F. (2013). Affective polarity word discovering by means of artificial general intelligence techniques. In Proceedings of the TASS workshop at SEPLN 2013, IV Congreso Español de Informática.del-Hoyo-Alonso, R., de la Vega Rodrigalvarez-Chamorro, M., Vea-Murguía, J., & Montañes-Salas, R. M. (2015). Ensemble algorithm with syntactical tree features to improve the opinion analysis. In Proceedings of TASS 2015: workshop on sentiment analysis at SEPLN co-located with 31st SEPLN conference (SEPLN 2015) (pp. 53–58), Alicante, Spain, September 15, 2015.Deriu, J., Gonzenbach, M., Uzdilli, F., Lucchi, A., De Luca, V., & Jaggi, M. (2016). Swisscheese at semeval-2016 task 4: Sentiment classification using an ensemble of convolutional neural networks with distant supervision. In Proceedings of the 10th international workshop on semantic evaluation (SemEval-2016) (pp. 1124–1128), Association for Computational Linguistics, San Diego, California, http://www.aclweb.org/anthology/S16-1173 .Díaz-Galiano, M. C., & Montejo-Ráez, A. (2015). Participación de SINAI DW2Vec en TASS 2015. In Proceedings of TASS 2015: Workshop on sentiment analysis at SEPLN co-located with 31st SEPLN conference (SEPLN 2015) (pp. 59–64), Alicante, Spain, September 15, 2015.Fernández, J., Gutiérrez, Y., Tomás, D., Gómez, J. M. & Martínez-Barco, P. (2015). Evaluating a sentiment analysis approach from a business point of view. In Proceedings of TASS 2015: Workshop on sentiment analysis at SEPLN co-located with 31st SEPLN conference (SEPLN 2015) (pp. 93–98), Alicante, Spain, September 15, 2015.Fernández, J., Gutiérrez, Y., Gómez, J.M., Martínez-Barco, P., Montoyo A., & Muñoz, R. (2013). Sentiment analysis of Spanish Tweets using a ranking algorithm and skipgrams. In Proceedings of the TASS workshop at SEPLN 2013, IV Congreso Español de Informática.Frank, E., Hall, M. A., & Witten, I. H. (2016). The WEKA workbench. Online appendix for “Data mining: Practical machine learning tools and techniques” (4th ed.). Burlington: Morgan Kaufmann.Gamallo, P., García, M. & Fernández-Lanza, S. (2013). TASS: A Naive-Bayes strategy for sentiment analysis on Spanish tweets. In Proceedings of the TASS workshop at SEPLN 2013, IV Congreso Español de Informática.García Cumbreras, M. Á., Martínez Cámara, E., Villena-Román, J., & García Morera, J. (2016a). TASS 2015—The evolution of the Spanish opinion mining systems. Procesamiento del Lenguaje Natural.García Cumbreras, M. Á., Villena Román, J., Martínez Cámara, E., Díaz Galiano, M. C., Martín Valdivia, M. T., & Ureña López, L. A. (2016b). Overview of TASS 2016. In Proceedings of TASS 2016: Workshop on sentiment analysis at SEPLN co-located with 32nd SEPLN conference (SEPLN 2016) (pp. 13–21), Salamanca, Spain, September 13th, 2016.García, D., & Thelwall, M. (2013). Political alignment and emotional expression in Spanish Tweets. In Proceedings of the TASS workshop at SEPLN 2013, IV Congreso Español de Informática.Hagen, M., Potthast, M., Büchner, M., & Stein, B. (2015). Webis: An ensemble for twitter sentiment detection. In Proceedings of the 9th international workshop on semantic evaluation (SemEval 2015) (pp. 582–589), Association for Computational Linguistics, Denver, Colorado, http://www.aclweb.org/anthology/S15-2097 .Hamdan, H., Bellot, P., & Bechet, F. (2015). Lsislif: Crf and logistic regression for opinion target extraction and sentiment polarity analysis. In Proceedings of the 9th international workshop on semantic evaluation (SemEval 2015) (pp. 753–758), Association for Computational Linguistics, Denver, Colorado, http://www.aclweb.org/anthology/S15-2128 .Hernández Petlachi, R., & Li, X. (2014). Análisis de sentimiento sobre textos en Español basado en aproximaciones semánticas con reglas lingüísticas. In Proceedings of the TASS workshop at SEPLN 2014.Hurtado, L.F. , & Pla, F. (2014). ELiRF-UPV en TASS 2014: Análisis de Sentimientos, Detección de Tópicos y Análisis de Sentimientos de Aspectos en Twitter. In Proceedings of the TASS workshop at SEPLN 2014.Hurtado, L. F., & Pla, F. (2016). ELiRF-UPV en TASS 2016: Análisis de Sentimientos en Twitter. In Proceedings of TASS 2016: Workshop on sentiment analysis at SEPLN co-located with 32nd SEPLN conference (SEPLN 2016) (pp. 47–51), Salamanca, Spain, September 13th, 2016.Hurtado, L. F., Pla, F., & Buscaldi, D. (2015). ELiRF-UPV en TASS 2015: Análisis de Sentimientos en Twitter. In Proceedings of TASS 2015: Workshop on sentiment analysis at SEPLN co-located with 31st SEPLN conference (SEPLN 2015) (pp. 75–79), Alicante, Spain, September 15, 2015.Jansen, B. J., Zhang, M., Sobel, K., & Chowdury, A. (2009). Twitter power: Tweets as electronic word of mouth. Journal of the American Society for Information Science and Technology, 60(11), 2169–2188.Jiménez Zafra, S. M., Martínez Cámara, E., Martín Valdivia, M. T., & Ureña López, L. A. (2014) SINAI-ESMA: An unsupervised approach for sentiment analysis in Twitter. In Proceedings of the TASS workshop at SEPLN 2014.Liu, B. (2012). Sentiment analysis and opinion mining. A comprehensive introduction and survey. San Rafael: Morgan & Claypool Publishers.Liu, B., Hu, M., & Cheng, J. (2005). Opinion observer: Analyzing and comparing opinions on the web. In Proceedings of the 14th international conference on world wide web (pp. 342–351), ACM, New York, NY, USA, WWW ’05, doi: 10.1145/1060745.1060797 , http://doi.acm.org/10.1145/1060745.1060797Martínez-Cámara, E., Martín-Valdivia, M. T., Ureña-López, L. A., & Montejo-Raéz, A. (2014). Sentiment analysis in Twitter. Natural Language Engineering, 1(1), 1–28.Martínez-Cámara, E., García-Cumbreras, M.Á., Martín-Valdivia, M. T., & López, L. A. U. (2015). SINAI-EMMA: Vectores de Palabras para el Análisis de Opiniones en Twitter. In Proceedings of TASS 2015: Workshop on sentiment analysis at SEPLN co-located with 31st SEPLN conference (SEPLN 2015) (pp. 41–46), Alicante, Spain, September 15, 2015.Martín-Wanton, T., & de Albornoz, J. C. (2012). UNED at TASS 2012: Polarity classification and trending topic system. Technical report, http://www.sepln.org/workshops/tass/2012/participation.php .Martínez-Cámara, E., Ángel García-Cumbreras, M., Martín-Valdivia, M. T., & Ureña-López, L. A. (2013). SINAI-EMML: Combinación de Recursos Lingüíticos para el Análisis de la Opinión en Twitter. In Proceedings of the TASS workshop at SEPLN 2013, IV Congreso Español de Informática.Martínez-Cámara, E., Martín-Valdivia, M. T., Molina-González, M. D., & Ureña-López, L. A. (2013). Bilingual experiments on an opinion comparable corpus. In Proceedings of the 4th workshop on computational approaches to subjectivity, sentiment and social media analysis (pp. 87–93).Mendizabal, I., & Carandell, J. (2015). BittenPotato: Tweet sentiment analysis by combining multiple classifiers. In Proceedings of TASS 2015: Workshop on sentiment analysis at SEPLN co-located with 31st SEPLN conference (SEPLN 2015) (pp. 71–74), Alicante, Spain, September 15, 2015.Mohammad, S., Kiritchenko, S., & Zhu, X. (2013). Nrc-canada: Building the state-of-the-art in sentiment analysis of tweets. In Second joint conference on lexical and computational semantics (*SEM), Volume 2: Proceedings of the seventh international workshop on semantic evaluation (SemEval 2013) (pp. 321–327), Association for Computational Linguistics, Atlanta, Georgia, USA, http://www.aclweb.org/anthology/S13-2053 .Montejo-Ráez, A., & Díaz-Galiano, M. C. (2016). Participación de SINAI en TASS 2016. In Proceedings of TASS 2016: Workshop on sentiment analysis at SEPLN co-located with 32nd SEPLN conference (SEPLN 2016) (pp. 41–45), Salamanca, Spain, September 13th, 2016.Montejo-Ráez, A., Díaz-Galiano, M. C., & García-Vega, M. (2013). LSA based approach to TASS 2013. In Proceedings of the TASS workshop at SEPLN 2013, IV Congreso Español de Informática.Montejo-Ráez, A., García-Cumbreras, M., & Díaz-Galiano, M. (2014). Participación de SINAI Word2Vec en TASS 2014. In Proceedings of the TASS workshop at SEPLN 2014.Moreno-Ortiz, A., & Pérez-Hernández, C. (2012). Lexicon-based sentiment analysis of Twitter messages in Spanish. Technical report, http://www.sepln.org/workshops/tass/2012/participation.php .Nakov, P., Kozareva, Z., Ritter, A., Rosenthal, S., Stoyanov, V., & Wilson, T. (2013). SemEval-2013 Task 2: Sentiment analysis in Twitter.Nakov, P., Ritter, A., Rosenthal, S., Stoyanov, V., & Sebastiani, F. (2016). SemEval-2016 Task 4: Sentiment analysis in Twitter. In Proceedings of the 10th international workshop on semantic evaluation (pp. 1–18), Association for Computational Linguistics, San Diego, California, SemEval ’16.O’Connor, B., Krieger, M., & Ahn, D. (2010). TweetMotif: Exploratory search and topic summarization for Twitter. In Cohen, W. W. & Gosling, S. (Eds)., Proceedings of the fourth international conference on weblogs and social media, ICWSM 2010, Washington, DC, USA, May 23-26, 2010, The AAAI Press, http://www.aaai.org/ocs/index.php/ICWSM/ICWSM10/paper/view/1540 .Padró, L., & Stanilovsky, E. (2012). FreeLing 3.0: Towards Wider Multilinguality. In Proceedings of the language resources and evaluation conference (LREC 2012), ELRA, Istanbul, Turkey.Pang, B., Lee, L., & Vaithyanathan, S. (2002). Thumbs up? Sentiment classification using machine learning techniques. In Proceedings of EMNLP (pp. 79–86).Park, S. (2015). Sentiment Classification Using Sociolinguistic Clusters. In Proceedings of TASS 2015: Workshop on sentiment analysis at SEPLN co-located with 31st SEPLN conference (SEPLN 2015) (pp. 99–104), Alicante, Spain, September 15, 2015.Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., et al. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825–2830.Perea-Ortega, J. M. & Balahur, A. (2014). Experiments on feature replacements for polarity classification of Spanish tweets. In Proceedings of the TASS workshop at SEPLN 2014.Perez-Rosas, V., Banea, C., & Mihalcea, R. (2012). Learning Sentiment Lexicons in Spanish. In: N. C. C. Chair, K. Choukri, T. Declerck, M. U. Doğan, B. Maegaard, J. Mariani, J. Odijk, & S. Piperidis (Eds.), Proceedings of the eight international conference on language resources and evaluation (LREC’12), European Language Resources Association (ELRA), Istanbul, Turkey.Pla, F., & Hurtado, L. F. (2013a) ELiRF-UPV en TASS-2013: Análisis de sentimientos en Twitter. In Proceedings of the TASS workshop at SEPLN 2013, IV Congreso Español de Informática.Pla, F., & Hurtado, L. F. (2013b) ELiRF-UPV en TASS-2013: Análisis de sentimientos en Twitter. In XXIX Congreso de la Sociedad Espanola para el Procesamiento del Lenguaje Natural (SEPLN 2013) TASS (pp. 220–227).Pla, F., & Hurtado, L. F. (2014a) Political tendency identification in Twitter using sentiment analysis techniques. In Proceedings of COLING 2014, the 25th international conference on computational linguistics: Technical Papers (pp. 183–192), Dublin City University and Association for Computational Linguistics, Dublin, Ireland, http://www.aclweb.org/anthology/C14-1019 .Pla, F., & Hurtado, L. F. (2014b) Sentiment analysis in Twitter for Spanish. In International conference on applications of natural language to data bases/information systems (pp. 208–213), Springer International Publishing.Quirós, A., Segura-Bedmar, I., & Martínez, P. (2016). LABDA at the 2016 TASS challenge task: Using word embeddings for the sentiment analysis task. In Proceedings of TASS 2016: workshop on sentiment analysis at SEPLN co-located with 32nd SEPLN conference (SEPLN 2016) (pp. 29–33), Salamanca, Spain, September 13th, 2016.Ramón Quevedo, J., Luaces, O., & Bahamonde, A. (2012). Multilabel classifiers with a probabilistic thresholding strategy. Pattern Recogn, 45(2), 876–883.Rosenthal, S., Nakov, P., Ritter, A., & Stoyanov, V. (2014). SemEval-2014 Task 9: Sentiment analysis in Twitter. In: P. Nakov, T. Zesch (Eds.), Proceedings of the 8th international workshop on semantic evaluation, SemEval ’14, Dublin, Ireland.Rosenthal, S., Nakov, P., Kiritchenko, S., Mohammad, S., Ritter, A., & Stoyanov, V. (2015). SemEval-2015 Task 10: Sentiment analysis in Twitter. In: Proceedings of the 9th international workshop on semantic evaluation (SemEval 2015) (pp. 451–463), Association for Computational Linguistics, Denver, Colorado, http://www.aclweb.org/anthology/S15-2078 .Rouvier, M., & Favre, B. (2016). SENSEI-LIF at SemEval-2016 task 4: Polarity embedding fusion for robust sentiment analysis. In Proceedings of the 10th international workshop on semantic evaluation (SemEval-2016) (pp. 202–208), Association for Computational Linguistics, San Diego, California, http://www.aclweb.org/anthology/S16-1030 .San Vicente Roncal, I., & Saralegi Urizar, X. (2014). Looking for features for supervised tweet polarity classification. In Proceedings of the TASS workshop at SEPLN 2014.Santos-Deas, M., Biran, O., McKeown, K., & Rosenthal, S. (2015). Spanish Twitter messages polarized through the lens of an english system. In Proceedings of TASS 2015: Workshop on sentiment analysis at SEPLN co-located with 31st SEPLN conference (SEPLN 2015) (pp. 81–86), Alicante, Spain, September 15, 2015.Saralegi, X., & San Vicente, I. (2012). TASS: Detecting sentiments in Spanish tweets. Technical report, http://www.sepln.org/workshops/tass/2012/participation.php .Saralegi, X., & San Vicente, I. (2013). Elhuyar at TASS 2013. In Proceedings of the TASS workshop at SEPLN 2013, IV Congreso Español de Informática.Sebastiani, F. (2002). Machine learning in automated text categorization. ACM Computing Surveys, 34(1), 1–47. doi: 10.1145/505282.505283 .Segura-Bedmar, I., Quiròs, A., & Martìnez, P. (2017). Exploring convolutional neural networks for sentiment analysis of Spanish tweets. In Proceedings of EACL (15th conference of the European chapter of the Association for Computational Linguistics) (pp. 1014–1022), Association for Computational Linguistics.Severyn, A., & Moschitti, A. (2015). Unitn: Training deep convolutional neural network for twitter sentiment classification. In Proceedings of the 9th international workshop on semantic evaluation (SemEval 2015) (pp. 464–469), Association for Computational Linguistics, Denver, Colorado, http://www.aclweb.org/anthology/S15-2079 .Siordia, O. S., Moctezuma, D., Graff, M., Miranda-Jiménez, S., Téllez, E. S., & Villaseñor, E. (2015). Sentiment analysis for Twitter: TASS 2015. In Proceedings of TASS 2015: Workshop on sentiment analysis at SEPLN co-located with 31st SEPLN Conference (SEPLN 2015) (pp 65–70), Alicante, Spain, September 15, 2015.Sixto-Cesteros, J., Almeida, A., & López-de-Ipiña, D. (2015). DeustoTech Internet at TASS 2015: Sentiment analysis and polarity classification in Spanish tweets. In: Proceedings of TASS 2015: Workshop on sentiment analysis at SEPLN co-located with 31st SEPLN conference (SEPLN 2015) (pp. 23–28), Alicante, Spain, September 15, 2015.Trilla, A., & Alías, F. (2012). Sentiment analysis of Twitter messages based on multinomial Naive Bayes. Technical report, http://www.sepln.org/workshops/tass/2012/participation.php .Tsoumakas, G., & Katakis, I. (2007). Multi-label classification: An overview. International Journal of Data Warehousing and Mining, 2007, 1–13.Turney, P. D. (2002). Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In ACL (pp. 417–424), http://www.aclweb.org/anthology/P02-1053.pdf .Valverde-Tohalino, J., & Tejada-Cárcamo, J. (2015). Comparing supervised learning methods for classifying Spanish tweets. In Proceedings of TASS 2015: Workshop on sentiment analysis at SEPLN co-located with 31st SEPLN conference (SEPLN 2015) (pp. 87–92), Alicante, Spain, September 15, 2015.Vilares, D., Alonso, M. A., & Gómez-Rodríguez, C. (2013). LyS at TASS 2013: Analysing Spanish tweets by means of dependency parsing, semantic-oriented lexicons and psychometric word-properties. In Proceedings of the TASS workshop at SEPLN 2013, IV Congreso Español de Informática.Vilares, D., Doval, Y., Alonso, M. A. & Gómez-Rodríguez, C. (2014). LyS at TASS 2014: A prototype for extracting and analysing aspects from Spanish tweets. In Proceedings of the TASS workshop at SEPLN 2014.Vilares, D., Doval, Y., Alonso, M. A., & Gómez-Rodríguez, C. (2015). LyS at TASS 2015: Deep learning experiments for sentiment analysis on Spanish tweets. In Proceedings of TASS 2015: Workshop on sentiment analysis at SEPLN co-located with 31st SEPLN conference (SEPLN 2015) (pp. 47–52), Alicante, Spain, September 15, 2015.Villar Rodríguez, E., Torre Bastida, A. I., García Serrano, A., & González Rodríguez, M. (2013). TECNALIA-UNED @ TASS: Uso de un enfoque lingüístico para el análisis de sentimientos. In Proceedings of the TASS workshop at SEPLN 2013, IV Congreso Español de Informática.Villena-Román, J., García Morera, J., García Cumbreras, MÁ., Martínez Cámara, E., Martín Valdivia, M. T., & Ureña López, L. A. (2013a). Workshop on sentiment analysis at SEPLN 2013: An overview. In Proceedings of the TASS workshop at SEPLN 2013, Villena-Román, Julio; García Morera, Janine; García Cumbreras, Miguel Ángel; Martínez Cámara, Eugenio; Martín Valdivia, M. Teresa; Ureña López, L. Alfonso.Villena-Román, J., Lana-Serrano, S., Martínez-Cámara, E., & González-Cristóbal, J. C. (2013b). TASS-workshop on sentiment analysis at SEPLN. Procesamiento del Lenguaje Natural, 50, 37–44.Villena-Román, J., García Morera, J., García Cumbreras, MÁ., Martínez Cámara, E., Martín Valdivia, M. T., & Ureña López, L.A. (2014). Workshop on sentiment analysis at SEPLN: Overview. In Proceedings of the TASS workshop at SEPLN 2014, Villena-Román, Julio; García Morera, Janine; García Cumbreras, Miguel Ángel; Martínez Cámara, Eugenio; Martín Val

RiuNet

Language identification of multilingual posts from Twitter: a case study

Author: Hurtado Oliver Lluis Felip
Pla Santamaría Ferran
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 29/09/2016
Field of study

The final publication is available at Springer via http://dx.doi.org/10.1007/s10115-016-0997-xThis paper describes a method for handling multi-class and multi-label classification problems based on the support vector machine formalism. This method has been applied to the language identification problem in Twitter. The system evaluation was performed mainly on a Twitter data set developed in the TweetLID workshop. This data set contains bilingual tweets written in the most commonly used Iberian languages (i.e., Spanish, Portuguese, Catalan, Basque, and Galician) as well as the English language. We address the following problems: (1) social media texts. We propose a suitable tokenization that processes the peculiarities of Twitter; (2) multilingual tweets. Since a tweet can belong to more than one language, we need to use a multi-class and multi-label classifier; (3) similar languages. We study the main confusions among similar languages; and (4) unbalanced classes. We propose threshold-based strategy to favor classes with less data. We have also studied the use of Wikipedia and the addition of new tweets in order to increase the training data set. Additionally, we have tested our system on Bergsma corpus, a collection of tweets in nine languages, focusing on confusable languages using the Cyrillic, Arabic, and Devanagari alphabets. To our knowledge, we obtained the best results published on the TweetLID data set and results that are in line with the best results published on Bergsma data set.This work has been partially funded by the project ASLP-MULAN: Audio, Speech and Language Processing for Multimedia Analytics (MINECO TIN2014-54288-C4-3-R).Pla Santamaría, F.; Hurtado Oliver, LF. (2016). Language identification of multilingual posts from Twitter: a case study. Knowledge and Information Systems. 51(3):965-989. https://doi.org/10.1007/s10115-016-0997-xS965989513Baldwin T, Lui M (2010) Language identification: the long and the short of the matter. In: Human language technologies: the 2010 annual conference of the North American chapter of the association for computational linguistics, HLT ‘10. Association for Computational Linguistics, Stroudsburg, PA, pp 229–237Bergsma S, McNamee P, Bagdouri M, Fink C, Wilson T (2012) Language identification for creating language-specific twitter collections. In: Proceedings of the second workshop on language in social media, LSM ‘12. Association for Computational Linguistics, Stroudsburg, PA, pp 65–74Carter S, Weerkamp W, Tsagkias M (2013) Microblog language identification: overcoming the limitations of short, unedited and idiomatic text. Lang Resour Eval 47(1):195–215Cavnar WB, Trenkle JM (1994) N-gram-based text categorization. In: Proceedings of SDAIR-94, 3rd annual symposium on document analysis and information retrieval, pp. 161–175Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297Gamallo P, García M, Sotelo S, Campos JRP (2014) Comparing ranking-based and naive bayes approaches to language detection on tweets. ‘TweetLID@SEPLN’, pp 12–16Goldszmidt M, Najork M, Paparizos S (2013) Boot-strapping language identifiers for short colloquial postings. In: Proceeding of the European conference on machine learning and principles and practice of knowledge discovery in databases (ECMLPKDD 2013). SpringerGrefenstette G (1995) Comparing two language identification schemes. In: 3rd international conference on statistical analysis of textural dataHurtado LF, Pla F, Giménez M, Arnal ES (2014) Elirf-upv en tweetlid: Identificación del idioma en twitter, In: Proceedings of the Tweet language identification workshop co-located with 30th conference of the Spanish society for natural language processing, TweetLID@SEPLN 2014, Girona, 16 Sept 2014, pp 35–38Jauhiainen T, Lindén K, Jauhiainen H (2015) Language set identification in noisy synthetic multilingual documents. In: Gelbukh A (ed) Computational linguistics and intelligent text processing, vol 9041 of lecture notes in computer science. Springer International Publishing, pp 633–643Joachims T (1998) Text categorization with support vector machines: learning with many relevant features. In: Nédellec C, Rouveirol C (eds) Proceedings of ECML-98, 10th European conference on machine learning, no. 1398. Springer, Heidelberg, pp 137–142Liu B (2012) Sentiment analysis and opinion mining. A comprehensive introduction and survey. Morgan & Claypool Publishers, San RafaelLjubešić N, Mikelić N, Boras D (2007) Language identification: How to distinguish similar languages, In: Lužar-Stifter V, Hljuz Dobrić V (eds), Proceedings of the 29th international conference on information technology interfaces. SRCE University Computing Centre, Zagreb, pp 541–546Lui M, Baldwin T (2014) Accurate language identification of twitter messages. In: Proceedings of the EACL 2014 workshop on language analysis in social media (LASM 2014), pp 17–25Lui M, Lau JH, Baldwin T (2014) Automatic detection and language identification of multilingual documents. Trans Assoc Comput Linguist 2:27–40Nguyen D, Dogruoz AS (2014) Word level language identification in online multilingual communication. In: Proceedings of the 2013 conference on empirical methods in natural language processingO’Connor B, Krieger M, Ahn D (2010) Tweetmotif: exploratory search and topic summarization for twitter. In: Cohen WW, Gosling S (eds) Proceedings of the fourth international conference on weblogs and social media, ICWSM 2010, Washington, DC. The AAAI Press, 23–26 May 2010Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830Pla F, Hurtado L-F (2014) Political tendency identification in twitter using sentiment analysis techniques. In: Proceedings of COLING 2014, the 25th international conference on computational linguistics: technical papers. Dublin City University and Association for Computational Linguistics, Dublin, pp 183–192Prager JM (1999) Linguini: language identification for multilingual documents. J Manage Inf Syst 16(3):71–101Ramón Quevedo J, Luaces O, Bahamonde A (2012) Multilabel classifiers with a probabilistic thresholding strategy. Pattern Recogn 45(2):876–883Rao D, Yarowsky D, Shreevats A, Gupta M (2010) Classifying latent user attributes in twitter. In: Proceedings of the 2nd international workshop on search and mining user-generated contents, SMUC ‘10. ACM, New York, NY, pp 37–44Sebastiani F (2002) Machine learning in automated text categorization. ACM Comput Surv 34(1):1–47Tsoumakas G, Katakis I (2007) Multi-label classification: an overview. Int J Data Warehous Min 2007:1–13Zubiaga A, Vicente IS, Gamallo P, Campos JRP, Loinaz IA, Aranberri N, Ezeiza A Fresno-Fernández V (2014) Overview of tweetlid: Tweet language identification at SEPLN 2014. In: Proceedings of the Tweet language identification workshop co-located with 30th conference of the Spanish society for natural language processing. TweetLID@SEPLN 2014, Girona, Spain, 16 Sept 2014, pp 1–11Zubiaga A, San Vicente I, Gamallo P, Pichel JR, Alegria I, Aranberri N, Ezeiza A, Fresno V (2015) TweetLID: a benchmark for tweet language identification. J Lang Res Eval. Springer, pp 1–38. doi: 10.1007/s10579-015-9317-

RiuNet